Skip to content

Conversation

@danielhanchen
Copy link

Beware - my C is very rusty (haven't done C in like ages lol) - I might have transcribed it incorrectly from https://github.com/unslothai/unsloth/blob/main/unsloth/models/llama.py#L1116

From https://news.ycombinator.com/item?id=41053201
Llama 3.1 uses a new RoPE scaling mechanism for 128K context extension using:

# From https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/api/model.py#L41
def apply_scaling(self, freqs: torch.Tensor):
    # Values obtained from grid search
    scale_factor = 8
    low_freq_factor = 1
    high_freq_factor = 4
    old_context_len = 8192  # original llama3 length

    low_freq_wavelen = old_context_len / low_freq_factor
    high_freq_wavelen = old_context_len / high_freq_factor
    new_freqs = []
    for freq in freqs:
        wavelen = 2 * math.pi / freq
        if wavelen < high_freq_wavelen:
            new_freqs.append(freq)
        elif wavelen > low_freq_wavelen:
            new_freqs.append(freq / scale_factor)
        else:
            assert low_freq_wavelen != high_freq_wavelen
            smooth = (old_context_len / wavelen - low_freq_factor) / (
                high_freq_factor - low_freq_factor
            )
            new_freqs.append((1 - smooth) * freq / scale_factor + smooth * freq)
    return torch.tensor(new_freqs, dtype=freqs.dtype, device=freqs.device)

Did not add a flag to enable Llama 3.1 scaling though

Beware - my C is very rusty (haven't done C in like ages lol) - I might have transcribed it incorrectly from https://github.com/unslothai/unsloth/blob/main/unsloth/models/llama.py#L1116
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant